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^ Abstract 

o 

Kiers (1991) considered the orthogonal rotation in PCAMIX, a principal compo- 
O nent method for a mixture of qualitative and quantitative variables. PCAMIX includes 

>-v the ordinary principal component analysis (PCA) and multiple correspondence anal- 

ysis (MCA) as special cases. In this paper, we give a new presentation of PCAMIX 
^ where the principal components and the squared loadings are obtained from a Singu- 

f— I lar Value Decomposition. The loadings of the quantitative variables and the principal 

coordinates of the categories of the qualitative variables arc also obtained directly. In 
this context, we propose a computationaly efficient procedure for varimax rotation in 
PCAMIX and a direct solution for the optimal angle of rotation. A simulation study 
shows the good computational behavior of the proposed algorithm. An application on 
a real data set illustrates the interest of using rotation in MCA. All source codes are 
^ available in the R package "PCAmixdata" . 

^ Keywords: mixture of qualitative and quantitative data, principal component analy- 

^ sis, multiple correspondence analysis, rotation. 

o 

1 Introduction 



C/3 



Kaiser (1958) introduced the varimax criterion for the attainment of simple structures by 
orthogonal rotation in Principal Component Analysis (PCA) . This criterion aims at maxi- 
mizing the sum over the columns of the squared elements of the loading matrix. The loading 
matrix plays a significant part in the interpretation of the results since it contains the corre- 
lations between the variables and the principal components. The idea is to get components 
so that the interpretation is easier, that is to rotate the loading matrix and the standardized 
principal components so that the groups of variables appear: having high loadings on the 
same component, moderate ones on a few components and negligible ones on the remaining 
components. Because the Singular Value Decomposition (SVD) approach in PCA gives one 
the freedom for orthogonal rotation, the percentage of variance explained is redistributed 
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along the newly rotated axes, while still conserving the variance explained by the solution 
as a whole. 

Kiers (1991) extended the varimax criterion for the attainment of simple structures in 
PCAMIX, a principal component method for the mixture of qualitative and quantitative 
variables. For qualitative variables, the coefficient used to express the link between a variable 
and a component is the correlation ratio; this correlation ratio plays the role of a squared 
loading. The varimax criterion is then expressed with squared loadings defined as correlation 
ratios for qualitative variables and squared correlations for quantitative variables. Algorithms 
devised for the determination of an optimal orthogonal rotation in the context of PCA, as 
proposed for example by Kaiser's (1958), Neudeckcr (1981) or Jennrich (2001) did not apply 
to this extended varimax criterion. So Kiers (1991) proposes a matrix reformulation of 
this new varimax criterion in order to replace the optimization problem with a problem of 
simultaneous diagonalization of a set of symmetric matrices (ten Berge, 1984), and suggests 
the use of the algorithm of de Leeuw and Pruzansky (1978) to solve the latter. To the best 
of our knowledge, the resulting algorithm has never been presented in a single paper, so 
we have recalled for comparison purpose the main steps of the matrix reformulation and 
the simultaneous diagonalization. We shall refer to this algorithm as Kiers' (1991) original 
approach to PCAMIX. 

In this paper we will first present a new formulation of PCAMIX. It is similar to that 
of Escoficr (1979) and Pages (2004) in the way quantitative and qualitative variables arc 
transformed, but it is presented via a SVD. This presents a direct way to determine both 
the component scores and the squared loadings and also the principal coordinates of the 
categories of the qualitative variables as well as the loadings of the qualitative variables. 
Then we will search for an optimal rotation for the PCAMIX varimax criterion using the 
iterative procedure suggested by Kaiser (1958) for PCA: we will rotate pairs of dimensions 
according to an optimal angle 9, iteratively until the process converges. A new direct, specific 
to PCAMIX determination of this angle is proposed. We shall refer to the resulting algorithm 
as the SVD approach to PCAMIX. This algorithm leads to the same final rotation as Kiers' 
(1991) original approach, however a simulation study shows that it is computationally more 
efficient. When all the variables are quantitative, the new algorithm reduces to the classical 
Kaiser's (1958) procedure for orthogonal rotation in PCA with a new direct expression of 
the optimal planar angle 9. 

Notice that Kaiser's varimax rotation procedure does not always produce an optimal 
rotation in PCA. ten Berge (1995) made suggestions for addressing this point for PCA. This 
is an open problem for PCAMIX. 
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This paper is organized as follows. Section [2] recalls Kiers' original PCAMIX method 
and proposes an alternative formulation using SVD. Section |3] deals with varimax rotation 



in PCAMIX. The optimization problem is given section 3.1 The determination of the 



optimal angle of rotation with Kiers' matrix reformulation approach is described section 3.2.1 



for purpose of comparison with the direct solution proposed section 3.2.2 The complete 



procedure for orthogonal rotation in more than two dimensions is given section 3.3 A 
simulation study compares section [41 the computational time of the proposed rotation 



procedure with the rotation procedure based on Kiers (1991). In section 4.2 a real data 
application illustrates the interest of rotation in MCA and shows some of the outputs and 
graphical representations available in the R package "PCAmixdata" we have developed. 



2 The PCAMIX method 

Let us first introduce some notations used in the presentation of the PCAMIX method. 

• Let n denote the number of observation units, pi the number of quantitative variables, 
P2 the number of qualitative variables and p = pi + P2 the total number of variables. 

• Let Zj be the column vector which contains the standardized scores of the n objects 
on variable j if the j-th variable is quantitative. 

• Let Gj be the indicator matrix for the variable j if the j-th variable is qualitative and 
let Dj be the diagonal matrix of frequencies of categories of this variable. 

• Let us denote by m the number of categories of the p2 qualitative variables. 

• Let G = (Gi| • • • |Gj| • • ■ IGpj) be the n x m matrix of the indicator variables of the 
m categories of the p2 qualitative variables and let D = diag(Di, . . . , D^, . . . , Dp^) be 
the m X m diagonal matrix of frequencies of the m categories. 

• Let 3 = In — ll'/n be the centering operator where I„ denotes the n x n identity 
matrix and 1 the vector of order n with unit entries. 

In the two following subsections, we give two formulations of the PCAMIX method and 
highlight their main differences. 

2.1 The original PCAMIX procedure 

Suppose k is the number of components required in PCAMIX. In Kiers (1991), the procedure 
computes the n x k matrix X of the standardized component scores, the variance of each 
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component and the p x k matrix C of the squared loadings. The squared loadings are de- 
fined as squared correlation for quantitative variables and as correlation ratio for qualitative 
variables. This procedure is carried out according to the following steps: 

1. For j = 1, . . . ,p: calculate the so-called n x n quantification matrix Sj with: 

J Sj = ^Zjz'j if variable j is quantitative, 

[ Sj — JGjDj^G^ J if variable j is qualitative. 

2. Calculate the n x n matrix S = Yl^=i ^j- 

3. Perform an Eigenvalue Decomposition of S. The matrix X of the standardized com- 
ponent scores is given by the first k eigenvectors of S normalized to n (such that 
X'X^nlfc). 

4. For I — 1, . . . , A;: calculate the variance of the l-th component given by xJSx; where 
denotes the l-th column of X. 

5. Calculate the matrix C of the squared loadings of the p variables on the k components 
with Cji — ^xJSjX;. For quantitative (resp. qualitative) variables, Cji is the squared 
correlation (resp. correlation ratio) between the variable j and the component I. 

When all the variables are quantitative (resp. qualitative), this procedure is equivalent to 
PCA (resp. MCA). But the loadings (the correlations between the variables and the com- 
ponents) and the principal coordinates of the categories (the barycenters of the component 
scores) are not directly provided and must be calculated afterwards if desired. From a prac- 
tical point of view this procedure requires the construction and the storage of p matrices of 
dimension n x n which can leads to memory size problems when n and p increase. 

2.2 The SVD based PCAMIX procedure 

This procedure is carried out according to the following steps: 

1. Determine the n x (pi + m) matrix of interest Z = ;^(Zi|Z2) where : 

• Zi = (zi| • • • |zj| • • • |zpj is the n x pi matrix of the standardized scores of the n 
observation units (objects) on the pi quantitative variables. 

• Z2 is the n X m matrix obtained by recoding G in the following way: Z2 = 
JGD-V2. 
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2. Perform the SVD of Z : 

Z = UAV, (1) 

where U'U = V'V — Ir, A is the diagonal matrix of singular values (in weakly de- 
scending order) and r is the rank of Z. 

3. Calculate the n x k matrix of the standardized component scores: 

X = V^Ufe (2) 
where denotes the matrix of the first k columns of U. 

4. For £ — 1, . . . , /c, the standard deviation of the ^-th component is given by the £-th 
singular value in A. 

5. Calculate the matrix: 

A = VfeAfe, (3) 

where Vfc denote the matrix of the first k columns of V and A^ the diagonal matrix 

of the k largest singular values. 

6. Write A — (^^^ the concatenation of a pi x /c matrix Ai and a m x k matrix A2. 

• The matrix Ai contains the loadings of the quantitative variables (the correlations 
between the quantitative variables and the components). 

• The matrix DA2 contains the principal coordinates of the categories of the qual- 
itative variables. 

• Calculate the matrix C of the squared loadings of the p variables on the k com- 
ponents. This matrix is obtained from the matrix A as follows: 

{Cji — a^i if variable j is quantitative, 

Cji — Xlsg/^. clIi if variable j is qualitative, 

where Ij is the set of row indices of A associated with the categories of the 

qualitative variable j. To simplify the notations, we note hereafter cji = 

for both quantitative and qualitative variables with Ij — {j} in the quantitative 

case. 

Note that the matrix X of the standardized component scores is obtained from the SVD 
of the recoded data matrix Z whereas it was obtained from the Eigenvalue Decomposition of 
the matrix S (the sum of the quantification matrices Sj) in Kiers' original approach. Also, 
the matrix C of the squared loadings (squared correlations or correlation ratios between 
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the variables and the components) is calculated here from the only matrix A obtained with 
the SVD of Z whereas it was calculated from the two matrices X and Sj in Kiers' original 
approach. 

Contrary to the original PCAMIX approach, this procedure simultaneously provides the 
loadings of the quantitative variables and the principal coordinates of the categories of the 
qualitative variables. Moreover, when the data are mixed (quantitative and qualitative), the 
well known barycentric property in MCA remains true: the coordinates of the categories 
are the averages of the standardized component scores of the objects in those categories. 
The matrices X, Ai and DA2 are then used to plot the observation units, the quantitative 
variables and the categories with the same interpretation rules as in PCA and MCA. Matrix 
C is used to plot the quantitative and qualitative variables on a same graphic. 

3 Varimax rotation in PCAMIX 
3.1 The optimization problem 

Why using rotation ? As shown by Eckart and Young (1936), from the SVD in ([T]) and 
definitions of matrices X and A given in ^ and ([3]), the matrix XA' is a rank k least squares 
approximation of Z. Let us introduce T an orthonormal rotation matrix: TT' = T'T = 1^. 
Let X = XT and A = AT. As XA' = XA', this approximation is not unique over 
orthogonal rotations. 

This non-uniqueness can be exploited to improve the interpretability of the original so- 
lutions. To simplify the interpretations, the matrices X and A are then rotated in such a 
way that when considering one variable, few squared loadings are large (close to 1) and as 
many as possible are close to zero. 

The varimax problem. In PCA, since A contains the loadings of the variables after 
rotation, the varimax rotation problem is formulated as 



max /(T), 

s.t. TT' = T'T = Ifc, 

where 

k p ^ k / p 



(4) 
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1=1 j=i ^ 1=1 \j=i 



is the varimax function measuring the simplicity of the components after rotation. 

In the SVD approach of PCAMIX, the varimax function / is defined by replacing in ^ 
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the terms aji by Cji, where the Cji = Xlse/j ^"si squared loadings after rotation: 

k p -. k / p \^ 

1=1 j=i ^ 1=1 \j=i / 

Note that the squared loadings after rotation cji are squared correlations (resp. correlation 
ratios) between the quantitative (resp. qualitative) variables and the rotated components. 

For comparison purpose, we recall Kiers' original expression of the varimax function in 
PCAMIX: the squared loadings after rotation cji are given by ^x^SjX;, where x/ denotes the 
Z-th column of X. Hence the varimax function (j6| becomes: 

1=1 j=l ^ ^ ^ 1=1 \j=l J 

The iterative optimization procedure. Because a direct solution for the optimal T is 
not available, an iterative optimization procedure suggested by Kaiser (1958) for PCA can 
be used for PCAMIX. The idea is to consider at each iteration a planar rotation for which 
the rotation matrix T only depends of an angle Q (see below for details). This procedure 
rotates pairs of dimensions in the following way: the single-plane rotations are applied to 
dimensions 1 and 2, 1 and 3, . . ., 1 and /c, 2 and 3,. . ., {k — 1) and A;, iteratively until the 
process converges, i.e. until kik — l)/2 successive rotations providing an angle of rotation 
equal to zero are obtained. 

The key point of this rotation procedure is the definition of the single-plane rotation step. 
We give next details on the calculation of the optimal angle for planar rotation. Then we 
give the complete iterative procedure for rotation in more than two dimensions. 

3.2 Planar rotation 

Single planar rotations are obtained with a rotation matrix T defined by 

T = 

where Q is the angle of rotation. The varimax rotation problem (|4]) is then rewritten as: 

max/ (6'). 



cos Q —sin Q 
sin Q cos Q 



For purpose of comparison we recall first the solution based on Kiers' matrix reformulation 
before we give our direct solution. 
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3.2.1 Planar rotation using the Kiers' matrix reformulation 

Kiers (1991) proposes to use a procedure of simultaneous diagonalization of a set of sym- 
metric matrices (ten Berge, 1984; de Leeuw and Pruzansky,1978) to solve the global varimax 
optimization problem (|4]). For that purpose he gives the following matrix reformulation of 
the formula ([T]) giving / : 

p 

/(T) = J2 Trace (rE,T(Diag T'E.T)) (9) 

i=i 

where 

Ej = p X'SjX - nV (10) 

and r is the diagonal matrix with the k first eigenvalues of S on its diagonal. 

Careful reading of ten Berge (1984) and de Leeuw and Pruzansky (1978) shows that 
the procedure for simultaneous diagonalization of the matrices Ej is equivalent to Kaiser's 
iterative optimization procedure with the optimal angle 6 of single plane rotations defined 
by the equation: 



where 



tan(4^) = y, (11) 





4E^i2(^ii-e22) and b = J2ie{,-ei,r-Aj2ie{2r (12) 
j=i j=i j=i 



and Ej = { y ] is defined in (|10|). 

-21 ^22 



As mentionned by several authors(see for instance Nevels, 1986; ten Berge, 1984; de 



Leeuw and Pruzansky, 1978 and Kaiser, 1958) equation (11) is only a necessary condition 
obtained upon setting the first order derivative of the objective function to zero. Both 
Kaiser (1958) and de Leeuw and Pruzansky (1978) developed a procedure for determining 
the optimal 9 from the sign of the second order derivative of the objective function. These 
two procedures, expressed in tabular form, give the appropriate solution for every possible 
combination of signs of a and h. 

3.2.2 Planar rotation using the SVD approach of PCAMIX 

The varimax function /(T) defined with the SVD approach in (|6| is written: 

m = E (e + E (e - J (e E -i] - ^ ft E -2.) (13) 

j=i \sei, I j=i ) ^ \i=i se/, / ^ \j=i se/, / 

with 

O'si = «si cos(6') + sin(^) and 0^2 = —O'si sin(6') + 0^2 cos(6'). (14) 
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This function is equal to (see Appendix): 

/(^) = /(O) + ^ ( cos(4^ - ^) - cos ij) (15) 
where p and ip are defined by : 

p = (a^ + 6^)^/^ , cosip = b/p , smilj = a/p (16) 

with a and b given by : 

p p p p f ^ \^ f ^ \^ 
a = '^P^UjVj-2^Uj^Vj , h = pY^{uj'^ -Vj'^) - y^uA + [^vA , (17) 

j=i i=i j=i j=i \j=i J \j=i / 

where Uj and Vj are defined by : 



^(a^i - a^2) and = 2 ^ asia<j2 • 



The function / obtained in (15) is maximum for cos(40 — ^) = 1<^A9 — ^ = 2fc7r, thus 
the optimal angles are : 

\P 71 

6 = - + k-, keZ. (19) 

Note that the above expressions of Uj and Vj contain as special cases (take Ij = {j}) 
those defined by Kaiser (1958) for the PCA varimax solution. Note also that the classical 



necessary condition (11) immediately follows by setting the expression (23) of pf'{0) given 



in the Appendix to zero (the coefficients b and a given by (12) on one side, and (17) (18) on 
the other side are proportional). 

3.3 The iterative rotation procedure. 

We consider now the case where the number k of dimensions in the rotation is greater 
than two. The iterative rotation procedure gives the matrix X of the rotated standardized 
component scores and the matrix A which is used to obtain the rotated squared loadings, 
the rotated loadings (correlations) of the quantitative variables and the rotated principal 
coordinates of the categories. This procedure is carried out according to the following steps: 

1. Initialization : X = X and A = A where the n x k matrix X and the {pi + m) x k 



matrix A are given by the SVD based PCAMIX procedure given section 2.2 



2. For I = 1, . . . , k — 1 and t = {I + 1), . . . , k, calculate for the pair of dimensions (/, t): 



the angle of rotation 6 = ^/4 with \P defined in (16) . We choose: 



b 

. arcos(— = ) if a > 0, 

^ = { ^^\+ (20) 

— arcos( , ) if a < 0. 

Wa^ + b^ 



where a and b are defined in (17). 
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- the matrix of rotation T = 



cos 9 —sin 6 
sin 9 cos 9 



- the matrices X and A updated by rotation of their /-th and t-th column. 
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Repeat the previous step until the k{k — l)/2 angles 9 are equal to zero. 
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Calculate: 



the matrix C with cji = ^ 



seij ^si- 



- the matrix Ai of the pi first rows of A which contains the rotated loadings of the 
quantitative variables. 

- the matrix A2 of the m last rows of A and the matrix DA2 which contains the 
rotated principal coordinates of the categories of the qualitative variables. 



The main differences between this procedure and that constructed with Kiers' matrix 
reformulation are the following: 

• The expressions of a and b in step (2): in this procedure they are expressed according 
to the matrix A of dimension (pi + m) x n where pi is the number of quantitative 
variables and m is the total number of categories. With Kiers' matrix reformulation, 
a and b are expressed according to the p matrices Sj of dimension n x n. Then the 
calculation and the storage of these matrices may be time and space consuming. 

• The direct determination of the optimal angle in step (2). Having an explicit ex- 
pression for the solution is of theoretical interest and is more straightforward from a 
computational point of view. 

• The outputs: this procedure provides directly the rotated loadings of the quantitative 
variables and the rotated principal coordinates of the categories which are used for 
graphical representations after rotation. 

4 Numerical studies 

The procedure proposed in this paper for varimax orthogonal rotation in PCAMIX has 
been implemented in R. A package called "PCAmixdata" is already available on the CRAN 
website. In this section, this algorithm is compared on simulated data with Kiers' rotation 
procedure. Then an application on a real data example illustrates the possible benefits of 
using rotation in MCA as particular case of PCAMIX. 
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4.1 A simulation study: comparison of computational times 

An iterative rotation procedure based on Kiers' matrix reformulation has also been imple- 



mented in R. This procedure is that proposed section 3.3 with the following modifications: 



Kiers' original PCAMIX procedure is used in the initialization step in place of the SVD 
based PCAMIX procedure. 

All the calculations and outputs based on the matrix A are removed because this 
matrix is not part of the original PCAMIX procedure. 



The coefficients a and b in step 2 are calculated according to their expressions (12) 



associated to Kiers' matrix reformulation. Note that the ratio | is the same with the 
two approaches (SVD and matrix reformulation) so the optimal angle 6 is the same. 

• In step 4 the squared loadings are calculated with their expression in the original 
PCAMIX approach. 

The computation time of the two rotation procedures (the one based on Kier's matrix refor- 
mulation and the one based on the SVD approach of PCAMIX) is compared from simulated 
datasets with varying parameters: the number p of variables (p/2 quantitative and p/2 qual- 
itative) and the number n of observations. For each set of parameters {n,p), 20 simulations 
are drawn. More precisely the datasets are built using the following procedure: 

• A dataset with n observations and p variables is drawn from a multivariate normal 
distribution with a covariance matrix S = Q'Q where Q is a p x p matrix drawn from 
a uniform distribution on the interval [0.2; 0.4]. 

• The p/2 last variables are distributed in three equal-count categories. Each dataset is 
then constituted of pi = p/2 quantitative variable, p2 = p/2 qualitative variable and 
the total number of categories is m = 3 * p/2. 

Because the two rotation procedures iterate planar rotations until convergence, we compare 
their computation time for k = 2. The median computation times (over the 20 replications) 
are given in Table [T] and the ratio between the computation time of the two approaches are 
given in Table [2] 

Table[T]shows that the SVD approach is faster than the matrix reformulation approach for 
all configurations. For configurations where p = 10, Table [2] shows that the SVD approach is 
from 3 times faster for n = 50 to 214 times faster for n = 800. For configurations with greater 
values of p, this ratio is less important but still increases with n. For the configuration where 
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Table 1: Median computation time (in seconds) of two PCAMIX rotation procedures: the 
one based on Kiers' matrix reformulation and the one based on the SVD appoach. 





p=lO p=50 p=100 p=200 


n=50 


2.9 2.0 1.8 1.6 


n=100 


8.7 3.8 3.3 3.0 


n=200 


23.2 10.3 7.0 6.4 


n=400 


69.4 27.7 19.0 14.2 


n=800 


214.1 77.4 52.9 error 



Table 2: Ratio between the median computation time of the two rotation procedures (Matrix 
reformulation / SVD) . 

n and p are great {n = 800 and p = 200) an error occurs with the rotation procedure based 
on Kiers'matrix refromulation. The maximum capacity of memory size of the computer was 
reached in that case. This error occurs during the calculation of the p matrices Sj of size 
n X n. This confirms the computational efficiency of the proposed SVD approach. 

4.2 A real data application 

This real data application illustrates the interest of rotation in MCA. A food habits surve}|^ 
was carried out in 1999 on students living in the region "Aquitaine" in south of west France. 
We focus on the answers of 2885 students to 12 binary questions concerning their consump- 
tion at breakfeast (coffe, cereals, eggs...). The PCAMIX method (equivalent here to MCA) 
has been applied to this dataset and the first 4 components have been rotated. 

In Figure [T] the association of the variables with the first two components is obviously 
easier after rotation. This rotation of the first four components leads in Table [3] to clear 
associations between the binary variables: coffe is associated with milk, eggs with cheese 
and deli, bread with jam and cereals with pure milk. The effect of the rotation on the 
objects' scores and on the categories' coordinates can also be visualized in Figures [2] and [3j 

^This survey was realized by the Bordeaux School of Public Health (Institut de Sante Publique, 
d'Epidemiologie et de Developpement - ISPED) 
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The interpretation rule associated with the barycentric property remains true after rotation. 
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Tabic 3: Correlation ratio (squared loadings) between the variables and the first 4 compo- 
nents before and after rotation 




Figure 1: Plots of the correlation ratios between the variables and the two first components 
before rotation and after rotation. 

Note that for binary variables MCA and PC A lead to equivalent object scores and squared 
loadings (correlations are equal to correlation ratio). Then considering the data as quanti- 
tative in PCAMIX (equivalent to PCA in that case) gives the same results except for the 
plots of the categories which are not defined in that case. 

5 Conclusion 

We have given in this paper a SVD based formulation of the PCAMIX method. This 
new formulation leads to an efficient procedure for varimax rotation in PCAMIX where a 
direct solution for the optimal angle of rotation 9 has been obtained. The numerical results 
have shown on simulations that this procedure is computationally more efficient than the 
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Scores before rotation 



Scores after rotation 




n 1 1 1 1 1 ^ 1 1 1 1 r 

-2 02468 -2 02468 



Dimension 1 Dimension 1 after rotation 

Figure 2: Plots of the (standardized) scores of the 2885 students on the first two components 
before and after rotation. 
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delihyes 
eggs=yes 

A I 

chefcse=yes 



tion^=ty«B=yes 



^ 1 1 1 r 

12 3 4 

Dimension 1 after rotation 



Figure 3: Plots of the category coordinates on the first two components before and after 
rotation. 

procedure based on Kiers' matrix reformulation. The numerical results have also shown on 
a real data apphcation the interest of this algorithm in the context of MCA with graphical 
representations of both variables and categories after rotation. The PCAMIX procedure as 
well as the rotation procedure have been implemented in the R package "PCAmixdata" . 
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Appendix 

Define tlie complex numbers 

def 

'S 

ti 



def 



a., = e ""as = as,i + «as,2 , 



where 1,5^,2 have been defined in (14), Uj,Vj in (18), and where Uj,Vj are given by the 
same formula as Uj, Vj, but with a tilde over ag^i, as,2- 

We introduce now a complex-valued varimax function F{9) of the rotation angle 9 by: 



where -F(O) is simply obtained by suppressing the tilde in F{6). Development of F{6) gives : 



F{e)=pY,{u] - vf) - C^u.f + (:£v,f + 2^{pY,u,v, - J]^, (21) 

j=i j=i j=i j=i j=i j=i 



V 

ih(e) 



Comparison with the formula (16), (17), (18) defining b,a,p,ip shows that : 



F(0) = ^(0) + ih{0) =b + ia = pe 



Hence : 



F{9) =pe 



i{ip-4e) 



p { cos(4^ -ip) -i sin(4^ - ^) } 



But derivation of the varimax function f{9) defined in (13) gives, using the fact that a'^_i(6') 
0^,2(6*) and a'^2(^) = : 



j=i j=i j=i 

= h{d) = -p sin(4^ - i:) , 



a cos 49 — b sin 49 



(22) 
(23) 



and (22) proves (15) by integration. 
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